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1. INTRODUCTION 

Madura is one of the islands in East Java and has four districts, namely: Bangkalan, Sampang, 
Pamekasan, and Sumenep. Tourist spots in Madura are attractive for tourists. These areas consist of natural 
tourism, religious tourism, and culinary tourism. One of the religious tourism spots in Sumenep is Asta 
Tinggi. Asta Tinggi is a burial place for the kings of the Sumenep kingdom which was founded in 1644. This 
tourist spot is located in the village of Kebon Agung, the sub-district of Sumenep city. In addition, Sumenep 
has Gili Labak Island, which is a new natural tourist attraction which became famous in mid-2014. This 
island has underwater beauty and white sand which is very attractive to tourists. In 2019, data from the 
Sumenep government showed that the number of tourist visits was 840,950, consisting of 839,398 domestic 
tourists and 1,507 foreign tourists. Increased tourist visits are an important source of economic development 
[1]-[3], employment [4] and government revenues [5]. Forecasting tourist visits is necessary for planning and 
future decision making. Therefore, accurate forecasting is very important for related agencies and industries 
in the tourism sector, such as: hospitality [6] and transportation to monitor and anticipate trends in demand 
for tourist visits. In addition, this forecasting will also help planning new business opportunities in travel [7]. 

Forecasting tourist visits has been carried out by several previous studies, such as: forecasting uses 
the empirical mode decomposition (EMD) method which is integrated with artificial neural networks (ANN) 
[8]. Most of the fluctuation in tourist visits tends to be non-linear which is influenced by several factors, such 
as: economic, seasonal and political conditions. EMD can accommodate the fluctuation and complexity of 
these factors [9]. The EMD, which has been recently revisited in [10], has been applied to many different 
application fields [11], [12]. However, EMD often produces mixed mode which sometimes does not match 
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the data pattern. This is a weakness of EMD. Ensemble empirical mode decomposition (EEMD) methods has 
been researched by [13], [14] to fix the weakness of EMD by adding white noise to the data. Forecasting 
using the integration of EEMD and artificial neural networks has been carried out to improve accuracy, as in 
research [15], [16]. The other research conducted by [17], that is predicted crude oil prices using EEMD and 
neural networks. The research compared the forecasting results of several learning methods of neural 
network. Combination of the feed-forward neural network (FNN) and the Polak-Ribiére conjugate gradient 
(PCG) learning process produce faster and good forecast accuracy compared to other learning methods. PCG 
looks for the non-positive value of the gradient in the network from the first iteration and searches according 
to the direction of the conjugation. 

However, neural network methods often experience over-fitting, local optima, and are sensitive to 
parameter selection. Useful alternative approaches include ANN and genetic algorithms (GA). The GA 
algorithm is used to optimize the weights and biases of the neural network. The combination of ANN and GA 
has been investigated for various fields, such as biology [18], traffic emissions [19] and construction [20]. 
This study aims to forecast tourist visits. The proposed novelty of this study is forecast tourist visits used 
EEMD and optimized artificial neural network use a GA. GA are used to optimize weight and bias of ANN. 
The remaining structure of this paper is given as; section 2 discusses the research method used in this study, 
section 3 presents the results and discussion of study, finally, the conclusion is discussed in section 4. 


2. METHOD 

In order to develop a good forecasting system in tourist visit, we propose a combination of the 
EEMD, ANN and GA methods. The complete proposed of tourist visit forecasting algorithm is shown in 
Figure 1. The data used is monthly data on tourist visits in Sumenep Regency. The data were obtained 
through the Sumenep district government from January 2015 to December 2019. The time series data of 
tourist visits are shown in Figure 2. 
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Figure 1. Tourist visit forecasting algorithm 


2.1. Ensemble empirical mode decomposition 

EEMD is a method to analyze data with the help of noise to eliminate mixing mode phenomenon 
and get the true frequency distribution of the original signal. EEMD is an improvement over the EMD 
method proposed by [13]. The principle of the EEMD is to add white noise to data, then distribute it evenly 
throughout frequency space. EEMD decomposed the data into a simple finite number of orthogonal 
oscillation modes. That is called intrinsic mode function (IMF). IMF requirements must be (1) the number of 
extrema and the number of zero-crossings must equal or at most differ by one, and (2) mean value of the 
upper envelope and the lower envelope is zero wherever. The EEMD algorithm decomposed the original data 
into several IMF and residues. The EEMD algorithm is as: 
Step 1: Initialize M. 
Step 2: Generate white noise. 
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Step 3: Adding white noise to the IMF in EMD. 

Step 4: Back to steps (2) and (3), use different white noise until m=M. 

Step 5: Find the ensemble mean for each IMF in experiment M using (1); 

Step 6: Find the ensemble mean for the residue in experiment M using (2); 

cC = ZIM Cim— i= l, sree „n (1) 


1 
n= sam Tm (2) 


Where M is number of ensembles, m is index of ensemble, i is index of IMF, n is index of residue, c is IMF, 
c is mean of ensemble for each IMF, r is residue and F is mean of ensemble for the residue. 
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Figure 2. The original data of tourist visits 


2.2. Data normalization 

The decomposition of the data produces several IMFs and residue, then each data on the IMF and 
the residue is normalized using (3). Data normalization is a function of activating the input value in neural 
network training. 


. X—Xmi 

x'= Min max * (a ) 

(v max * (mina (3) 
min 

Where variable x’ is the result of normalization, Ymin and Ymax are the min and max of activation function 

values, x is original data, Xmin and Xmax are the min and max of original data values. 


2.3. Artificial neural network 
Data learning in ANN is carried out using FNN which has been optimized using PCG. PCG is a type 
of learning that belongs to the neural network investigated by Polak and Ribiére. PCG find non-positive 
value of the network gradient starting from the first repetition and use conjugation direction [21]. The FNN 
learning algorithm with PCG optimization is: 
Step 1: Initialize overall weights. 
Step 2: Do step 3 through 6 while epoch <=10000 or learning rate >=0.1. 
Step 3: Find hidden layer output; using (4), (5) 
Step 4: Find the output of each output layer; using (6), (7) 
Step 5: Find the error factor in the output layer; using (8), (9) 
Step 6: Find the error factor in the hidden layer; using (10), (11), (12) 
Step 7: Find the gradient on the output layer; using (13) 
Step 8: Find the gradient on the hidden layer; using (14) 
Step 9: Find the parameter ß for all neurons; using (15) 
Step 10: Find the directions for all neurons; using (16), (17) 
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Step 11: Determine the o parameter for all neurons. 
Step 12: Update the weight using (18); using (18) 
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2.3. Genetic algorithm 
GA is used to improve weight values of artificial neural networks in forecasting tourist visits. The 
weight value of each layer on ANN becomes the chromosome value in GA. This weight data set becomes a 
population that will be optimized using GA [22]. The GA algorithm in optimizing the ANN architecture is 
presented in the following steps; 
Step 1:Determine the population size by trial and error. Each chromosome has two sets of genes, that is 
represents the number of input layer neurons and hidden layer neurons. There are 4 input layers and 


30 hidden layers and represent the population used in GA. 

Step 2: For each individual evaluate fitness function in population. 

Step 3: Select two individuals in the latest generation with highest fitness values. 
Step 4:Do crossovers and mutations to reproduce individuals in the next generation. 


ISSN: 2302-9285 


(4) 
(5) 
(6) 
(7) 
(8) 


(9) 
(10) 


(11) 
(12) 
(13) 


(14) 


(15) 


(16) 
(17) 
(18) 


Step 5: Back to step 2 until all individuals in the population reach 100 or the RMSE is less than 0.0001. 


Step 6: Decode individuals who converged on the last generation. 
Step 7: An optimized neural network architecture has been formed. 


2.4. Data aggregation 
The data has been decomposed into several IMFs and the residues are recombined into a single data. 
The adaptive linear neural network (Adaline) method is used to recombine the data. Recombine the data is 


called data aggregation. The Adaline method is shown in (19); 


y= LX Wy +b 


2.5. Data denormalization 
Data denormalization aims to return the processed data into real values. Denormalization can be 
done by modifying (3) to produce the value of Variable x. We used two error measurement techniques to 
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evaluate achievement model of tourist visit forecasting, namely mean square error (MSE) and root mean 
squared error (RMSE). The MSE and RMSE equations are shown in (20) and (21). Besides the forecasting 
error measurement technique, we also use the forecast movement direction value to develop the model. The 
direction of forecasting movement can be used to assist decision making. The direction of the forecast 
movement can be measured using a directional statistic (Dstat) which is stated in (22). 


1 

MSE = ~ dt=1(%e =x) (20) 
RMSE = J Xr- r — Xt)? (21) 
Dstat = 2 n asx 100% (22) 


Where n is the amount of actual data, x is actual data, ¢ is the i-th time, y+ is the forecast data and a is constant 
variable. al if (vit l-y;) t l-y) > 0, and a=0 if (+11) (i+ 1-y1) <0. 


3. RESULTS AND DISCUSSION 

In the EEMD method, the experiment in forecasting tourist visits is carried out using 100 ensembles, 
the standard deviation is 0.2, lower threshold is 0.05, upper threshold is 0.5 and tolerance is 0.05. 
Experimental results show that the data is spell out into 4 IMFs and one residue as shown in Figure 3. Each 
component in the IMF has a different frequency. This decomposition assumes that each data set consists of 
various intrinsic oscillation models. Each intrinsic mode (linear or nonlinear) is an oscillation that will have 
the same number of extremes symmetrically to the local mean value. 
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Figure 3. Results of data decomposition 


After the tourist visit data is decomposed, then the data is normalized and processed using ANN. 
After conducting several experiments to determine the structure of ANN, then the best ones were selected, 
namely, 4 input layers, 30 hidden layers and one output layer. The parameters used in the experiment are as; 
error tolerance is 0.0001, epoch is 10,000 and learning rate is 0.1. The best performance measurement for 
forecast value generated from ANN before optimization using GA is MSE and RMSE. MSE is 0.02130 and 
RMSE is 0.14591 as shown in Table 1. The performance will be compared with the forecast performance 
after the weight value is optimized using GA. 
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Table 1. The performance ANN before optimization using GA 
Data patterns MSE RMSE Dstat 
4-30-1 0.02130 0.14591 64.29% 


GA are used to improve weight values of neurons in input layer and hidden layer of ANN. The first 
experiment is done by tuning the population so that it gets the best performance. The best population is used 
in the next experiment, namely tuning the crossover and mutation probabilities to get the best performance. 
The best performance will be used in the next test. Tuning by varying the number of populations showed the 
best performance in a population of 45 with MSE 0.01334, MRSE 0.11552 and Dstat 78.57%. The 
experimental results are shown in Table 2. The next test is to vary the probability values of crossovers and 
mutations. In this test, the best performance is obtained when the crossover probability value is 0.9. and the 
mutation probability is 0.2. From a series of experiments, the best performance was obtained in a population 
of 45, crossover probability is 0.9. and mutation probability is 0.2. with MSE is 0.01334, MRSE is 0.11552 
and Dstat is 78.57% as shown in Table 3. 


Table 2. Experimental results by tuning population value 


Population gen Generation Cros. Prob. Mut. Prob. MSE RMSE Dstat 
5 6 100 0.9 0.2 0.01563 0.12501 78.57% 
15 6 100 0.9 0.2 0.01398 0.11825 78.57% 
30 6 100 0.9 0.2 0.01469 0.12119 78.57% 
45 6 100 0.9 0.2 0.01334 0.11552 78.57% 
60 6 100 0.9 0.2 0.01388 0.11783 78.57% 


Table 3. Experimental results by tuning crossover probability value 


Population gen _ Generation Cros. Prob. Mut. Prob. MSE RMSE Dstat 
45 6 100 0.1 0.2 0.01456 0.12067 78.57% 
45 6 100 0.2 0.2 0.01556 0.12475 78.57% 
45 6 100 0.3 0.2 0.01543 0.12421 78.57% 
45 6 100 0.4 0.2 0.01509 0.12283 85.71% 
45 6 100 0.5 0.2 0.01444 0.12016 78.57% 
45 6 100 0.6 0.2 0.01373 0.11716 78.57% 
45 6 100 0.7 0.2 0.01544 0.12426 78.57% 
45 6 100 0.8 0.2 0.01421 0.11919 78.57% 
45 6 100 0.9 0.2 0.01334 0.11552 78.57% 


It is worth mentioning that ANNs, as a machine learning (ML) technique inspired by the human 
brain [23], are a good choice for forecasting tasks because of their ability to generalise and find temporal 
patterns in the training data. Although we chose an FNN, we could have chosen a recurrent neural network 
(RNN) which might have produced better results as it takes into account time aspects that are intrinsic in time 
series data [24], [25]. However, we wanted to make use of best of ANNs and GAs by taking a GA as an 
optimiser for the weights of the neural network as it accelerates the learning process by better optimising 
hyper-parameters [26]. 

Based on testing the proposed method shows that forecasting with ANN optimization using GA 
results in better forecasting than ANN without GA optimization. Forecasting with GA optimization, error 
value of forecasting is reduced by 37%, 21% for MSE and RMSE as shown in Table 4. This shows that the effect 
of optimization with GA has a very significant improvement in producing accurate forecasting. Comparison of 
actual data and tourist visit forecasting data using the EEMD, NN and GA methods. shown in Figure 4. 


Table 4. Performance comparison 
With GA Without GA Reduce% 
MSE 0.01334 0.02130 37 
RMSE 0.11552 0.14591 21 
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Figure 4. Comparison of actual data and tourist visit forecasting 


4. CONCLUSION 

This study has proposed forecasting tourist visits use combination an ensemble empirical mode 
decomposition and an optimized artificial neural network using GA. GA are used to optimize weight values 
in artificial neural networks. The model was tested on tourist visit data in Sumenep Regency, Indonesia. 
Experiments were carried out by analyze differences in forecast results of proposed method compared with the 
EEMD-ANN method without GA optimization. Based on the experimental results, it shows that the 
investigated method has better performance, error value of forecasting is reduced by 37%, 21% for MSE, 
RMSE, respectively. For better forecasting development, optimization can be improved using other new 
methods in future research. 
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